-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Topology Support (AKA Private Networking) #428 #694
Conversation
This will close #428 |
@kris-nova Trying to get my head around the code changes, looks good, however am curious what happens with multiple availability zones and instance groups. Might be worth documenting that a little further. A NAT gateway can only be attached to one subnet.
The changes look like they only apply one NAT gateway? |
@ajohnstone - It's probably a bit earlier to be having design discussions around the code. Unfortunately, I am still working on putting the larger pieces together. It's a work in progress (WIP) Your point is valid, and we know that we will be needing more than 1 NGW in the long run.. Cheers |
50dc524
to
2043349
Compare
372598d
to
42b44e6
Compare
e9e4a96
to
18cc365
Compare
5e220db
to
7f4004b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good so far. Added some comments with the pitfalls I can see!
case api.TopologyHybrid1: | ||
cluster.Spec.Topology = &api.TopologySpec{Type: api.TopologyHybrid1} | ||
default: | ||
glog.Warningf("Unable to detect topology. Defaulting to public topology.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should differentiate between empty and invalid here (to guard against typos)
| ----------------- |----------- | ----------------------------------------------------------------------------------------------------------- | | ||
| Public Cluster | public | All masters/nodes will be launched in a **public subnet** in the VPC | | ||
| Private Cluster | private | All masters/nodes will be launched in a **private subnet** in the VPC | | ||
| Hybrid (1) | hybrid1 | All masters will be launched into a **private subnet**, All nodes will be launched into a **public subnet** | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't love this name. I realize it is only a shortcut name, but maybe something more self-descriptive like privatemasterspublicnodes
. Although that is a terrible name ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly didn't like it either :) Will think on it and see what else we can come up with
case api.TopologyPrivate: | ||
cluster.Spec.Topology = &api.TopologySpec{Type: api.TopologyPrivate} | ||
case api.TopologyHybrid1: | ||
cluster.Spec.Topology = &api.TopologySpec{Type: api.TopologyHybrid1} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should expand topology spec to have two fields - one for master & one for nodes. And then I wonder if this is in fact configured per instance-group... (though I don't actually understand the use-case of mixing right now)
@@ -343,8 +346,6 @@ func (c *Cluster) FillDefaults() error { | |||
// OK | |||
} else if c.Spec.Networking.External != nil { | |||
// OK | |||
} else if c.Spec.Networking.CNI != nil { | |||
// OK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you meant to delete this - a rebase thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep good catch - will fix :)
// Network Topology functions for template parsing | ||
// | ||
// Each of these functions can be used in the model templates | ||
// The go template package currently only supports boolean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can do eq Topology "private"
I think.
In general, it might be easier to replace the network.yaml with go code. I have a prototype of it somewhere that I'll paste a link to if I find it. But the whole reliance on templates just seems pretty fragile and seems to have turned out to be harder for people to change, not easier.
But completely up to you if you want to do this for this PR. Just if you're fighting templates you might find it easier to dump them :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should continue to use the templates and follow suit here - if we want to move away from them later we can.. But I think that is a larger effort.
They work for now, so lets just keep it consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
response, err := t.Cloud.EC2().CreateNatGateway(request) | ||
if err != nil { | ||
return fmt.Errorf("error creating Nat gateway: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I've tried doing (inconsistently) is referring to them by the class name i.e. NATGateway
everywhere.
} | ||
|
||
type terraformNatGateway struct { | ||
AllocationId *string `json:"AllocationID,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to match the names here: https://www.terraform.io/docs/providers/aws/r/nat_gateway.html
i.e. json:"allocation_id,omitempty"
} | ||
|
||
func (e *NATGateway) TerraformLink() *terraform.Literal { | ||
return terraform.LiteralProperty("aws_natgateway", *e.AllocationID, "id") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aws_nat_gateway
//SubnetID: e.SubnetID, | ||
} | ||
|
||
return t.RenderResource("aws_natgateway", *e.AllocationID, tf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aws_nat_gateway
"k8s.io/kops/upup/pkg/fi" | ||
) | ||
|
||
// VPC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh ... did it generate this? whoops...
Hey @kris-nova, looking forward to this, I am up for testing this once it's out of WIP. Currently I am rejigging the terraform output to get the desired effect. |
@billyshambrook if you can spend a few minutes, once we are complete and work on a test plan. At least document how you are doing the testing. Would be AWESOME!! |
b1ac2b2
to
0fde4cb
Compare
- Fixing topology.md (linting after review) - Adding error message for a neglected --networking cni on private topologies - Adding troubleshooting to documentation
- Fixing build errors - Missed a privatemasters reference - Fixing the nil pointer problem in SG awstask
- Adding populateClusterSpec unit tests - Topology happy/sad paths - Fleshing out topology in the buildMinimalCluster() function
bd8c59e
to
8c41dad
Compare
@@ -40,67 +40,72 @@ type ClusterList struct { | |||
|
|||
type ClusterSpec struct { | |||
// The Channel we are following | |||
Channel string `json:"channel,omitempty"` | |||
Channel string `json:"channel,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need a gofmt
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep! Fixed in 07eb92f
LGTM |
What's the underlying cause for the reliance on weave rather than AWS routing? I'm in particular concerned about communication between two of these clusters federated across AWS regions. |
@rbtcollins don't know much about federation, but probably due to routes limit in AWS route table? see http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Appendix_Limits.html#vpc-limits-route-tables |
// This function is replacing existing yaml | ||
func (tf *TemplateFunctions) GetBastionZone() (string, error) { | ||
var name string | ||
if len(tf.cluster.Spec.Zones) <= 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kris-nova I know this has been merged, but just bumped into this 2 zone requirement on a test, non-HA, deploy. Thought I'd bring it up here in case it's an off-by-one error. More likely that I'm missing an obvious reason for it though :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @robinpercy - can you provide a little more detail on how we could replicate this? Did you actually run into a scenario where this broke and you were unable to successfully deploy a cluster?
I think the logic here should support zone lists of length 1.
Happy to get a patch out for this ASAP if we can just confirm this is the concern.
Thanks for catching this one!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @kris-nova. Yeah, I ran into this when only specifying a single zone during cluster creation:
$ kops create cluster --topology=private --networking=cni --zones=us-west-1c --ssh-public-key=${MYKEY} --cloud=aws "${NAME}"
I1109 22:03:08.390730 4441 cluster.go:498] Assigned CIDR 172.20.96.0/19 to zone us-west-1c
I1109 22:03:08.392186 4441 cluster.go:509] Assigned Private CIDR 172.20.128.0/19 to zone us-west-1c
I1109 22:03:09.759130 4441 populate_cluster_spec.go:237] Defaulting DNS zone to: Z1HBLAHBLAHBLAHDF
Previewing changes that will be made:
error building tasks: error handling file "cloudup/_aws/topologies/_topology_private/bastion.yaml": error executing template "_aws/topologies/_topology_private/bastion.yaml": template: _aws/topologies/_topology_private/bastion.yaml:117:24: executing "_aws/topologies/_topology_private/bastion.yaml" at <GetBastionZone>: error calling GetBastionZone: Unable to detect zone name for bastion
Let me know if I should create a ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not all that critical for me. I'm just spinning up small test clusters to see how the private topology can work for us. And I'm happy to do that from the PR branch for the time being. Thanks for the quick response on this.
@rbtcollins @chulkilee it's because Kubernetes is hard coded to only handle one route table for node routing updates in AWS. Without modifying kubernetes code, this project is forced to utilize an overlay network to provide a functioning private network configuration |
@erutherford ok so whats the issue # in kubernetes itself - kops is part of the ecosystem, k8s isn't a fixed quantity to consume, its part of the solution. |
@rbtcollins I did a quick scan for tickets and didn't see one, so I don't know of an issue that's open yet. I was merely explaining my understanding of why weave was used for this iteration. I believe this is the code in question. |
Ok; should I file a ticket then? |
Network topologies in kops
Use Case: Private networking in AWS with Kubernetes defined in #428
Example Usage: Proposed Documentation
Summary
kops create cluster
--topology
public|private-t
public|private